Clinical Chemistry — Latest Matching Preprints

1

Nanopore Whole-Genome Sequencing for Rapid, Comprehensive Molecular Diagnostics of Brain Tumors in Adult Patients

Halldorsson, S.; Nagymihaly, R. M.; Bope, C. D.; Lund-Iversen, M.; Niehusmann, P.; Lien-Dahl, T.; Pahnke, J.; Bruning, T.; Kongelf, G.; Patel, A.; Sahm, F.; Euskirchen, P.; Leske, H.; Vik-Mo, E. O.

2026-04-24 pathology 10.64898/2026.04.23.26351563 medRxiv

Top 0.1%

6.4%

Show abstract

Background: Classification of central nervous system (CNS) tumors has become increasingly complex, raising concerns about the sustainability of comprehensive molecular diagnostics. We have evaluated nanopore whole genome sequencing (nWGS) as a single workflow to replace multiple diagnostic assays. Methods: We performed nWGS on DNA extracted from 90 adult CNS tumor samples (58 retrospective, 32 prospective) and compared the results to findings from standard of care (SoC) diagnostic work-up. Analysis was done through an automated workflow that consolidated diagnostically and therapeutically relevant genomic alterations, including copy-number variation, structural, and single-nucleotide variants, chromosomal aberrations, gene fusions, and methylation-based classification. Results: nWGS supported final diagnostic classification in all samples with >15% tumor cell content, requiring ~3 hours of hands-on library preparation, parallel sample processing, and sequencing times within 72 hours. Methylation-based classification was available within 1 hour and was concordant with the integrated final diagnosis in 89% of cases (80/90). All diagnostically relevant copy-number variations, single-nucleotide variants, and gene fusions were concordant with SoC testing. MGMT promoter methylation status matched in 94% of cases. In addition, nWGS identified prognostic and potentially actionable variants that were not reported or covered by SoC. Conclusions: nWGS delivers comprehensive genetic and epigenetic results with a fast turn-around compared to standard methods. This enables efficient, accurate, and scalable molecular diagnostics of CNS tumors using a single platform. This data supports its implementation in routine clinical practice and may be extended to other cancer types requiring complex genomic profiling.

2

Attention-Guided CNN Ensemble for Binary Classification of High-Grade and Low-Grade Serous Ovarian Carcinoma from Histopathological WSI Patches

rani, a.; mishra, s.

2026-04-22 oncology 10.64898/2026.04.21.26351441 medRxiv

Top 0.2%

2.8%

Show abstract

Accurate histopathological differentiation between High-Grade Serous Carcinoma (HGSC) and Low-Grade Serous Carcinoma (LGSC) remains a critical yet challenging aspect of ovarian cancer diagnosis due to their similar morphology and different clinical outcomes. This study presents a deep learning framework that uses custom attention mechanisms, including the Convolutional Block Attention Module (CBAM), Squeeze-and-Excitation (SE) blocks, and a Differential Attention module within five CNN architectures for automated binary classification of ovarian cancer subtypes from H&E WSI patches. Although individual models achieved higher accuracy, the ensemble stacking framework with a shallow MLP meta-learner delivered the best overall performance, with a ROC-AUC of 0.9211, an accuracy of 0.85, and F1-scores of 0.84 and 0.85 across both subtypes. These findings demonstrate that attention-guided feature recalibration combined with ensemble stacking provides robust and clinically interpretable discrimination of ovarian carcinoma subtypes.

3

Diagnostic Classification for Long Covid Patients identifying Persistent Virus and Hyperimmune Pathophysiologies

James-Pemberton, P.; Harper, D.; Wagerfield, P.; Watson, C.; Hervada, L.; Kohli, S.; Alder, S.; Shaw, A.

2026-04-22 infectious diseases 10.64898/2026.04.21.26351402 medRxiv

Top 0.2%

2.7%

Show abstract

A multiplex diagnostic test is evaluated for self-reported long COVID associated persistent symptoms and a poor recovery from a SARS-CoV-2 infection. A mass-standardised concentration of total antibodies (AC), high-quality (HQ) antibodies and percentage of HQ antibodies (HQ%) is assessed against a spectrum of spike proteins to the SARS-CoV-2 variants: Wuhan, , {delta}, and the Omicron variants BA.1, BA.2, BA.2.12.1, BA.2.75, BA.5, CH.1.1, BQ.1.1 and XBB.1.5 in three cohorts. A cohort of control patients (n = 46) recovered (CC) and a cohort of self-declared long COVID patients (n = 113) (LCC). A nested Receiver Operating Characteristic (ROC) analysis, performed for the variant with lowest HQ concentration in the spectrum, produced an area under the curve and AUC = 0.61 (0.53-0.70) for the CC vs LCC cohorts. For the LCC cohort, the cut-off thresholds for AC = 0.8 mg/L, HQ = 1.5 mg/L and HQ% of 34% were determined, leading to a 71% sensitivity and 66% specificity derived by the Youden metric. The cohorts may be fully classified based on ROC and outlier analysis to give an incidence of persistent virus 62% (95% CI 52% - 71%), hyperimmune 12% (95% CI 7% - 20%) and unclassified, 26% (95% CI 18% - 35%). The overall diagnostic accuracy for both the hyper and hypo immune is 69%. All clinical interventions can now be tailored for the heterogenous long COVID patient cohort.

4

Integrating α-Synuclein Seeding Activity (SAA) into routine practice: insights from the multicenter ALZAN Cohort

Jourdan, O.; Duchiron, M.; Torrent, J.; Turpinat, C.; Mondesert, E.; Busto, G.; Morchikh, M.; Dornadic, M.; Delaby, C.; Hirtz, C.; Thizy, L.; Barnier-Figue, G.; Perrein, F.; Jurici, S.; Gabelle, A.; Bennys, K.; Lehmann, S.

2026-04-23 neurology 10.64898/2026.04.21.26351389 medRxiv

Top 0.4%

1.6%

Show abstract

Objectives: To evaluate the diagnostic performance of the -synuclein seed amplification assay (SAA) and characterize the impact of -synuclein co-pathology on cognitive and biological profiles in routine clinical practice. Methods: We included 398 patients from the prospective multicenter ALZAN cohort recruited from memory clinics in Montpellier, Nimes, and Perpignan. All participants underwent CSF and blood sampling with measurement of CSF biomarkers (A{beta}42/40, tau, ptau181) and plasma biomarkers (A{beta}42/40, ptau181, ptau217, GFAP, NfL). Cognitive assessment was performed using the Mini-Mental State Examination (MMSE). Clinical diagnoses were independently confirmed by two senior neurologists. Syn status was determined by SAA (RT-QuIC). Results: Of 398 patients, 19 out of 20 patients with Lewy body dementia (LBD) (95.0%) and 32 out of 203 patients with AD (15.8%) were SAA+. SAA-positivity presented a sensitivity of 95% and a specificity of 93.5% for distinguishing LBD from patients without LBD or AD. In the entire cohort, SAA+ patients showed lower MMSE scores (p<0.01), lower CSF A{beta}42/40 ratio (p<0.01), and elevated plasma GFAP (p<0.05). Within the AD group, no significant differences in CSF or blood biomarkers were observed between SAA+ and SAA- patients. Within the AD subgroup, no significant differences in CSF or blood biomarkers were observed between SAA+ and SAA- patients, except for a lower CSF A{beta}42/40 ratio in SAA+ patients (p<0.01). Interpretation: SAA demonstrates good diagnostic capabilities for detecting LBD and confirms notable Syn co-pathology in AD. This study highlights the limitations of routine CSF and emerging blood biomarkers in capturing Syn pathology and the value of integrating SAA into routine neurodegenerative disease assessment.

5

Breath aerosol PCR for detection of lower respiratory tract infections: Evaluation of a non-invasive face mask collector in pneumonia patients

Tiseo, K.; Dräger, S.; Santhosh Kumar, H.; Alkhazashvili, M.; Hammann, A.; Risch, P.; Willi, R.; Mkhatvari, T.; Fialova, C.; Adlhart, C.; Szabo, D.; Suknidze, M.; Patchkoria, I.; Broger, T.; Ivanova Reipold, E.; Varshanidze, K.; Osthoff, M.

2026-04-21 infectious diseases 10.64898/2026.04.18.26351117 medRxiv

Top 0.5%

1.4%

Show abstract

1.Etiological diagnosis of lower respiratory tract infections (LRTIs) relies on sputum or bronchoalveolar lavage (BAL), which may be difficult to obtain or invasive. Exhaled breath aerosol (XBA) sampling offers a non-invasive alternative for pathogen detection. We evaluated the performance of the AveloMask, a face mask-based device designed to capture XBAs for molecular testing. In this prospective paired-sample study, hospitalized adults with pneumonia at three hospitals in Switzerland and Georgia provided an XBA sample using the AveloMask and a lower respiratory tract (LRT) specimen (sputum or BAL). XBA samples were analyzed by multiplex PCR using the Roche LightMix(R) panel and LRT samples were tested using the BioFire(R) FilmArray(R) Pneumonia Panel. Concordance between XBA and LRT samples was assessed using positive percent agreement (PPA), negative percent agreement (NPA), and overall percent agreement (OPA). Ninety-three participants were enrolled and 63 participants provided paired samples. AveloMask sampling identified the dominant pathogen (lowest Ct value in the LRT sample) in 40/47 LRT-positive cases (85.1%). Across all targets, PPA was 61% (95%CI, 50-72%), NPA was 100% (95%CI, 99-100%), and OPA was 95% (95% CI, 92-96%). PPA was higher for bacteria than for viruses and lower PPA was largely driven by reduced detection of low-abundance or co-infecting pathogens. In a subset analysis, AveloMask results showed substantial overlap with standard-of-care testing and could have supported antimicrobial de-escalation. Breath aerosol sampling using the AveloMask enabled non-invasive molecular detection of LRT pathogens in pneumonia cases and may complement conventional standard-of-care testing, particularly when sputum is unavailable.

6

Practical Management of Adverse Events Associated with Bispecific Antibodies for the Treatment of Multiple Myeloma: A Qualitative Interview Study

Graham, T. R.; White, M. G.; Blue, B.; Hartley-Brown, M.; Hunter, B. D.; Huynh, C.; Joseph, N.; Keruakous, A.; Pan, D.; Rudolph, P.; Sawhney, R.; Suvannasankha, A.

2026-04-27 oncology 10.64898/2026.04.24.26350878 medRxiv

Top 0.6%

1.0%

Show abstract

PURPOSE: Bispecific antibodies (BsAbs) represent a major advancement in the management of relapsed/refractory multiple myeloma (RRMM), offering high response rates even in heavily pretreated patients. However, their use presents operational, safety, and supportive care complexities that require coordinated care teams, and evolving infrastructure. This manuscript summarizes best practice recommendations for adverse event (AE) management, outpatient operational models, referral pathways, and emerging strategies to optimize long-term tolerability. METHODS: Medlive, A PlatformQ Health Brand, conducted qualitative interviews of academic and community-based clinicians. Discussions focused on BsAb implementation, patient selection and counseling, and AE management. Experts provided recommendations on team-based protocols, transitions of care, and inpatient versus outpatient considerations. RESULTS: Ten hematologists/oncologists (academic n=4; community n=6) described practice patterns, barriers, and perspectives on BsAb use. BsAbs were consistently regarded as highly effective across multiple lines of therapy, particularly for patients without alternatives. Cytokine release syndrome (CRS) was the most common acute toxicity, generally low grade and managed effectively with early tocilizumab, including prophylactic use in outpatient settings. Immune effector cell-associated neurotoxicity syndrome (ICANS) was rare, mild, and best mitigated through early recognition and caregiver support. Infections, largely from BCMA-associated hypogammaglobulinemia, frequently interrupted therapy, necessitating antiviral prophylaxis, pneumocystis jirovecii pneumonia (PJP) prophylaxis, and intravenous immunoglobulin (IVIG). Outpatient step-up dosing is expanding, supported by prophylactic strategies and academic-community collaboration. Timely referral was emphasized to preserving eligibility. Major outpatient challenges included sequencing, infrastructure readiness, and standardized caregiver and staff education. CONCLUSION: Effective community implementation of BsAbs requires multidisciplinary coordination, standardized AE protocols, infection prevention, and infrastructure to support monitoring, referrals, and equitable access. These measures are critical to ensure safe, sustainable integration of bispecific therapies and to optimize patient outcomes.

7

Scaling Multiplex qPCR Primer Design to 1000-plex using the Degenerate Incomplete Multiplex Primer List Extension (DIMPLE) Algorithm

Pinto, A.; Dong, X.; Wu, W.; Johnson, S. J.; Wen, Q.; Zhang, C.; Havey, J.; Wang, B.; Tang, G.; Farhat, A.; Zhang, D. Y.; Issa, G. C.; Zhang, X.

2026-04-21 bioengineering 10.64898/2026.04.17.719221 medRxiv

Top 0.6%

1.0%

Show abstract

Massively multiplexed qPCR is primarily constrained by increasing primer dimer formation as the number of distinct primers in a single reaction increases. Previous multiplex primer design algorithms either fail to sufficiently suppress primer dimers at 100+ plex, or take exceedingly high amounts of computational resources to complete. Here, we present DIMPLE, a linear-runtime primer design algorithm that effectively generates 10,000+ primers to amplify thousands of potential amplicons in a single qPCR reaction. As one clinical demonstration of this algorithm, we designed an assay to detect 2,302 distinct KMT2A gene fusion subtypes using 204 primers in a single tube. In contrast to FISH and convention NGS approaches with 2% variant allele frequency (VAF) limit of detection, our DIMPLE qPCR assay was able to analytically detect gene fusions down to 0.05% VAF. We also constructed proof-of-concept multiplex qPCR panels for additional oncology gene fusions, multiplex pathogen detection, and DNA methylation markers. The scalability and low computational cost DIMPLE are complementary to new instrument platforms for massively multiplex qPCR readout for enabling rapid, point-of-care nucleic acid testing.

8

Comparing Gleason Pattern 4 Measurement Approaches on Prostate Biopsy Using Machine Learning: A Proof-of-Principle Study

Buzoianu, M. M.; Yu, R.; Assel, M.; Bozkurt, A.; Aghdam, H.; Fine, S.; Vickers, A.

2026-04-24 oncology 10.64898/2026.04.23.26351615 medRxiv

Top 0.6%

0.9%

Show abstract

Objective: To demonstrate the proof of principle that machine learning (ML) can be used to quantify Gleason Pattern (GP) 4 on digitized biopsy slides using multiple measurement approaches, allowing direct comparison of their prognostic performance. Methods: We assembled a convenience sample of 726 patients with grade group 2-4 prostate cancer on systematic biopsy who underwent radical prostatectomy between 2014 and 2023. Digitized biopsy slides were analyzed using a machine-learning algorithm (PAIGE-AI) to quantify GP4 using multiple measurement approaches, particularly with respect to how gaps between cancer foci (interfocal stroma) were handled. GP4 extent was quantified using linear measurements or a pixel-based area metric. Discrimination of each GP4 quantification approach, along with Grade Group (GG), was assessed for adverse radical prostatectomy pathology and biochemical recurrence. Results: We identified 15 different quantification approaches and observed differences between their discrimination. The highest discrimination was in the pixel-counting method (AUC 0.648). GP4 quantification outperformed GG for predicting adverse pathology (AUC 0.627 vs 0.608). Amount of GP3 was non-predictive once GP4 was known. These findings were consistent for BCR. Conclusions: We were able to measure slides using 15 distinct measurement approaches and replicated prior findings using ML to quantify GP4. Our findings support the use of ML as a research tool to compare different GP4 quantification approaches. We intend to use our method on larger cohorts to determine with which measurement approach best predicts oncologic outcome.

9

A prognostic signature based on ectopic reactivation of eight tissue-specific genes in Diffuse Large B Cell Lymphoma.

Montaut, E.; Rainville, V.; Betton-Fraisse, P.; Merre, W.; Khedimallah, S.; Govin, J.; Rousseaux, S.; Khochbin, S.; Jardin, F.; Ruminy, P.; Bourova-Flin, E.; Emadali, A.; Carras, S.

2026-04-27 hematology 10.64898/2026.04.23.26351580 medRxiv

Top 0.7%

0.9%

Show abstract

Diffuse Large B-cell lymphoma (DLBCL) is the most common aggressive lymphoma in the Western world. First-line immunochemotherapy fails in approximately 30-40% of patients, with refractory and relapse patients presenting a dismal prognosis. Currently, these high-risk patients cannot be accurately identified at diagnosis. Using statistical modeling and machine learning approaches applied to large public DLBCL datasets, we identified a novel predictive signature based on the reactivation of eight normally silent tissue-dependent genes associated with survival. We then developed a multiplex RT-MLPseq based assay, compatible with formalin-fixed paraffin-embedded (FFPE) samples and transferable into routine clinical practice, enabling analysis of expression of these eight genes and validated their prognosis impact in an independent real-life cohort. This signature could be integrated with current prognostic indices and molecular classifications to improve patient stratification and guide treatment selection toward a personalized theragnostic approach, thereby enhancing management of non-responder patients.

10

Unraveling the potential of short and long read sequencing for human genome profiling

Leduc, A.; Bachr, A.; Sandron, F.; Delepine, M.; Delafoy, D.; Fund, C.; Daviaud, C.; Meslage, S.; Turon, V.; Bacq-Daian, D.; Rousseau, F.; Olaso, R.; Deleuze, J.-F.; Gerber, Z.; Meyer, V.

2026-04-22 genomics 10.64898/2026.04.20.719568 medRxiv

Top 0.8%

0.8%

Show abstract

Background: Short read sequencing technologies have dominated the field of human whole genome sequencing in the past years in terms of cost, throughput, and accuracy. However, thanks to recent technological evolution, long read approaches have become increasingly competitive and complementary to short reads. With the gap in the cost per genome closing slowly between both approaches, long reads might replace short read sequencing in future research and clinical applications. Still, comprehensive evaluation is necessary to conclude on the performance and general advantages of each technology. Results: In this study, we compared the latest chemistries of major suppliers of short and long read technologies: Illumina short reads, Illumina Complete Long Reads (ICLR), Pacific Biosciences HiFi reads (PacBio), and Oxford Nanopore Technologies long reads (ONT). Using the HG002 human reference sample and established bioinformatics guidelines, we assessed their variant calling performance against the latest available truth sets at different levels of coverage. For single nucleotide variant detection, all technologies were equivalent. Despite the latest improvements in chemistry, indel calling with ONT continues to lag in accuracy behind other technologies. In contrast, long reads delivered a clear advantage in structural variant detection, surpassing short reads in both accuracy and sensitivity. The hybrid ICLR approach achieved intermediate performance, narrowing the gap between short and long read sequencing. Furthermore, long reads enhanced haplotype-phasing resolution, enabling the phasing of over 80% of the genome. Conclusions: These findings highlight the specific strengths and limitations of recent sequencing technologies, aiding the decision-making in future research projects, technological platforms development, and clinical applications.

11

Spatial profiling of CAR protein organization reveals in vivo remodeling during CAR-T therapy

Kashima, Y.; Makishima, K.; van Ooijen, H.; Franzen, L.; Petkov, S.; Nishikii, H.; Zenkoh, J.; Suzuki, A.; Branting, A.; Sakata-Yanagimoto, M.; Suzuki, Y.

2026-04-22 genomics 10.64898/2026.04.20.719384 medRxiv

Top 0.8%

0.8%

Show abstract

Chimeric antigen receptor (CAR) T cell therapy utilizes genetically engineered patient-derived T cells to target cancer cells. Despite its clinical successes in multiple cancer types, the underlying molecular mechanisms by which molecules on CAR-T cells and surrounding cells interact with other proteins and collectively determine treatment efficacy remain elusive. Most previous studies have relied on transcriptome profiling, which does not fully reflect protein-level organization and interactions. In this study, we developed an antibody-oligonucleotide conjugate targeting the FMC63 region of CAR and integrated it into molecular pixelation (MPX). This approach enabled profiling of the dynamics of CAR molecules on cell surfaces as well as their colocalization with other proteins at the single-cell level. By applying MPX to longitudinal samples from three patients undergoing CAR-T cell therapy, we characterized the dynamic changes in CAR-associated protein organization in both pre-infusion CAR products and post-infusion peripheral blood. While CAR protein abundance and polarization showed limited variation across clinical courses, remodeling of a CAR-centered co-localization network was observed over time, including different retentions of specific molecular associations between patients with different clinical outcomes. Although derived from a limited cohort, our study identifies insights from this methodological framework beyond those gained by conventional omics analyses and offers results of a systematic investigation to predict and enhance CAR therapeutic outcomes. Key pointsO_LIMolecular pixelation was applied for chimeric antigen receptor (CAR) profiling at single-molecule and single-cell resolutions. C_LIO_LIProtein and transcriptome analyses of the CAR molecule showed dynamic remodeling during CAR-T therapy in patients with non-Hodgkin lymphoma. C_LI

12

CT-Based Deep Foundation Model for Predicting Immune Checkpoint Inhibitor-Induced Pneumonitis Risk in Lung Cancer

Muneer, A.; Showkatian, E.; Kitsel, Y.; Saad, M. B.; Sujit, S. J.; Soto, F.; Shroff, G. S.; Faiz, S. A.; Ghanbar, M. I.; Ismail, S. M.; Vokes, N. I.; Cascone, T.; Le, X.; Zhang, J.; Byers, L. A.; Jaffray, D.; Chang, J. Y.; Liao, Z.; Naing, A.; Gibbons, D. L.; Vaporciyan, A. A.; Heymach, J. V.; Suresh, K. S.; Altan, M.; Sheshadri, A.; Wu, J.

2026-04-23 oncology 10.64898/2026.04.21.26351428 medRxiv

Top 0.8%

0.8%

Show abstract

Background: Immune checkpoint inhibitors (ICIs) have revolutionized cancer therapy but can cause serious immune-related adverse events (irAEs), with pneumonitis (ICI-P) being among the most severe. Early identification of high-risk patients before ICI initiation is critical for closer monitoring, timely intervention, and improved outcomes. Purpose: To develop and validate a deep learning foundation model to predict ICI-P from baseline CT scans in patients with lung cancer. Methods: We designed the Checkpoint-Inhibitor Pneumonitis Hazard EstimatoR (CIPHER), a deep learning foundation model that combines contrastive learning with a transformer-based masked autoencoder to predict ICI-P from baseline CT scans in patients with lung cancer. Using self-supervised learning, CIPHER was pre-trained on 590,284 CT slices from 2,500 non-small cell lung cancer (NSCLC) patients to capture heterogeneous lung parenchymal patterns. After pre-training, the model was fine-tuned on an internal NSCLC cohort for ICI-P risk prediction, using images from 254 patients for model development and 93 patients for internal validation. We compared CIPHER with classical radiomic models and further evaluated it on an external NSCLC cohort of 116 patients. Results: In the internal immunotherapy cohort, CIPHER consistently distinguished patients at elevated risk of ICI-P from those without the event, with AUCs ranging from 0.77 to 0.85. In head-to-head benchmarking, CIPHER achieved an AUC of 0.83, outperforming the radiomic models. In the external validation cohort, CIPHER maintained strong performance (AUC = 0.83; balanced accuracy = 81.7%), exceeding the radiomic models (DeLong p = 0.0318) and demonstrating higher specificity without sacrificing sensitivity. By contrast, the radiomic model showed high sensitivity (85.0%) but markedly lower specificity (45.8%). Confusion matrix analysis confirmed the robust classification performance of CIPHER, correctly identifying 80 of 96 non-ICI-P cases and 16 of 20 ICI-P cases. Conclusions: We developed and externally validated CIPHER for predicting future risk of ICI-P from pre-treatment CT scans. With prospective validation, CIPHER may be incorporated into routine patient management to improve outcomes.

13

Determinants of DNA-sequence-based Diagnostic Yield in the CSER Consortium

Mavura, Y.; Crosslin, D.; Ferar, K. D.; Lawlor, J. M.; Greally, J. M.; Hindorff, L.; Jarvik, G. P.; Kalla, S.; Koenig, B. A.; Kvale, M.; Kwok, P.-Y.; Norton, M.; Plon, S. E.; Powell, B. C.; Slavotinek, A.; Thompson, M. L.; Popejoy, A. B.; Kenny, E. E.; Risch, N.

2026-04-22 genetic and genomic medicine 10.64898/2026.04.20.26351140 medRxiv

Top 1.0%

0.7%

Show abstract

PurposeDiagnostic yield from exome and genome sequencing varies widely across studies. It remains unclear how much of this variation reflects patient-level factors (e.g., sex, clinical features, race/ethnicity, genetic ancestry) versus site-level practices such as sequencing modality or variant interpretation workflows. We aimed to quantify the contributions of these factors to diagnostic outcomes across five U.S. clinical sequencing sites. MethodsWe performed a cross-sectional analysis of 3,008 prenatal, neonatal, and pediatric cases from the NHGRI Clinical Sequencing Evidence-Generating Research (CSER) consortium (2017-2023). Clinical indications spanned neurodevelopmental, neurological, immunological, metabolic, craniofacial, skeletal, cardiac, prenatal, and oncologic presentations. Genetic ancestry was inferred from sequencing data, and variants were interpreted using ACMG/AMP guidelines to classify DNA-based diagnoses. Generalized linear mixed models were used to estimate associations between diagnostic yield and fixed effects (sex, prenatal status, isolated cancer, number of clinical indications, sequencing modality, race/ethnicity, and genetic ancestry), while modeling study site as a random effect to quantify between-site variation. ResultsThe overall diagnostic yield was 19.0%. Multiple clinical indications (OR=1.47, 95% CI 1.20-1.80, p<0.001) were associated with higher diagnostic yield, and male sex (OR=0.80, 95% CI 0.66-0.96, p=0.017) and prenatal status (OR=0.63, 95% CI 0.44-0.90, p=0.012) were associated with lower yield. Sequencing modality, race/ethnicity, genetic ancestry, and isolated cancer were not statistically significantly associated with diagnostic outcomes.. A model without fixed effects attributed [~]10% of variance in diagnostic yield to between-site differences. After adjusting for covariates, site-level variance decreased to 5.7%, indicating consistent variation across sites not explained by measured patient factors. ConclusionAcross five sites, patient-level clinical features influenced diagnostic yield, but substantial site-level variation remained even after adjustment. Differences in variant interpretation, or case-classification practices may contribute to this residual variability. Further efforts to increase consistency in exome- and genome-sequencing diagnostic workflows may help reduce inter-site differences.

14

Molecular epidemiology of rifampicin resistant Mycobacterium tuberculosis in Vietnam

Solomon, O. E.; Nguyen, V. N.; Nguyen, H. B.; Nguyen, T. A.; MacLean, E. L.-H.; Fox, G. J.; Behr, M. A.

2026-04-27 infectious diseases 10.64898/2026.04.20.26351312 medRxiv

Top 1%

0.6%

Show abstract

Background: Vietnam is a top 20 burden country for multi-drug resistant/rifampicin-resistant tuberculosis (MDR/RR-TB), with nearly 10,000 cases a year. With the emergence of new diagnostic assays for M. tuberculosis and resistance, along with new drugs for both treatment and prevention, we sought to better understand the molecular epidemiology of RR-TB in this high-burden setting, through the study of clinical trial isolates from the VQUIN MDR trial. Methods: We assembled a sample of cultured isolates, collected from patients with confirmed RR-M. tuberculosis within 10 provinces, enriching for isolates from outside of the 2 major cities, Hanoi and Ho Chi Minh City. We subjected these isolates whole genome sequencing (WGS) and bioinformatic analysis, with a subset subject to phenotypic drug susceptibility testing to evaluate phenotypic/genotypic concordance. New genome sequences were phylogenetically contextualised to publicly-available M. tuberculosis genome sequences sampled in Vietnam from National Center for Biotechnology Information (NCBI) Sequence Read Archives (SRA). Results: Isolates from 252 RR-TB cases passed quality controls and were available for analysis. Xpert MTB/RIF had a high concordance with WGS-based rifampicin-resistance prediction (PPV=96.8%). Of the 244 isolates confirmed to be rifampicin resistant, a high proportion (235/244 = 96.3%) had mutations associated with resistance to at least one other first- or second-line antibiotic. Phenotypic drug susceptibility testing (DST) for rifampicin, isoniazid, and levofloxacin was completed for 77 isolates with a high concordance demonstrated between DST and genomic-based resistance predictions (67/77, 87.0% RIF; 76/77, 98.7% INH; 73/77, 94.8%LFX). High concordance was also observed with new and repurposed antibiotics linezolid (100%, 60/60), pretomanid (100%, 60/60), and bedaquiline (56/60, 93.3%). Rifampicin-resistant strains were more likely to be lineage 2.2.1, compared to rifampicin-susceptible M. tuberculosis strains in Vietnam, particularly in the major cities. Conclusions: The high prevalence of secondary drug-resistance beyond RIF and INH, along with the dominance of one major lineage across geographic regions, provides insights on the spread of MDR/RR-TB in Vietnam and reinforces the importance of prompt and broad detection of drug-resistance to inform the timely initiation of effective drug regimens.

15

Analytical performance of a multi-target open real-time PCR assay for simultaneous detection of tuberculosis, non-tuberculous mycobacteria, and drug resistance in a high-burden setting

Sidiq, Z.; Tyagi, P.; Anand, A.; Dwivedi, K. K.; Rajpal, S.; Chopra, K. K.

2026-04-24 infectious diseases 10.64898/2026.04.23.26351557 medRxiv

Top 1%

0.5%

Show abstract

Abstract Background Timely diagnosis of tuberculosis and drug resistance remains a cornerstone of effective disease control. Multiplex open molecular platforms capable of simultaneously detecting Mycobacterium tuberculosis complex (MTBc), non-tuberculous mycobacteria (NTM), and resistance to first-line anti-tuberculosis drugs could streamline diagnostic pathways. Methods We conducted a laboratory-based evaluation of two multiplex real-time PCR assays (MTBc/NTM R-Gene and MTB-RIF/INH R-Gene) using 300 well-characterized samples, including 150 MTBc-positive culture isolates (including rifampicin-resistant, isoniazid-resistant, and drug-susceptible strains) and 150 MTBc-negative samples (50 NTM isolates and 100 mycobacteria-negative specimens). Composite reference standards included culture, MPT64 antigen testing, and line probe assay corroborated by phenotypic drug susceptibility testing for resistance profiling, with NTM speciation performed using a dedicated line probe assay. DNA extraction was performed using the QIAamp DNA Mini Kit (QIAGEN, Germany), followed by amplification on a real-time PCR platform according to manufacturer instructions. The diagnostic performance was assessed against composite reference standards. Results The analytical performance for detecting MTBc demonstrated 100% sensitivity and specificity (150/150). NTM detection showed 70.0% sensitivity (35/50) and a specificity of 100%, highlighting limitations in coverage of NTM species. Rifampicin resistance was detected with a sensitivity of 96.0% (48/50) and specificity of 100%, whereas isoniazid resistance detection was 100% sensitive and specific (50/50). Agreement with established reference standards was high ({kappa}=0.76-1.00) within this analytical context. Interpretation This analytical validation demonstrates that multiplex open real-time PCR assays can accurately and simultaneously detect MTBc, NTM, and rifampicin and isoniazid resistance using culture isolates. While these platforms offer potential advantages in flexibility and expanded resistance profiling, additional studies on clinical diagnostic accuracy, cost-effectiveness analyses, and operational feasibility are required to determine their practical utility and programmatic impact in high-burden settings

16

Leveraging Open-Source Solutions to Build a Low-Cost Digital Pathology Pipeline for Translational Research

Stenberg, J.; Gullapalli, A.; Foucar, K.; Babu, D.; Redemann, J.; Joste, N.; Foucar, C.; Gratzinger, D.; George, T.; Ohgami, R.; Gullapalli, R. R.

2026-04-27 pathology 10.64898/2026.04.25.26350240 medRxiv

Top 1%

0.5%

Show abstract

Digital Pathology (DP) is a fast-emerging branch of pathology focused on digitizing pathology data. A key challenge of DP usage for pathology laboratories, especially mid- to small-sized clinical labs, are the upfront costs associated with instrumentation and the logistical challenges of implementation. In the current project, we built an end-to-end DP solution using low-cost, open-source components that is user-friendly at a small scale. We repurposed readily available microscopy components in a pathology lab to assemble a fully functional DP pipeline for translational research applications. We tested multiple low-cost complementary metal-oxide semiconductor (CMOS) cameras in this project and chose a user-friendly Canon camera for image acquisition. An open-source DP server solution, OMERO v.5.6.4, was used as the image management system (IMS) to host and serve the WSIs on an Ubuntu 22.04 operating system. The server-hosted WSI images were evaluated remotely and asynchronously by multiple pathologists physically situated in Albuquerque, NM; Salt Lake City, UT; and Palo Alto, CA. Each pathologist assessed the quality of the WSI pipeline, image quality, and WSI interaction experience using a 23-question survey. Overall, the custom, low-cost WSI pipeline was noted to be a robust and user-friendly experience by the pathologists. The current DP setup is unlikely to be useful as a commercial, scalable DP pipeline for large-scale clinical applications. However, it demonstrates the feasibility of creating customized, small-scale DP solutions (at a low price point) for asynchronous translational pathology research applications. Additionally, building customized DP pipelines provides excellent educational opportunities for pathology residents to gain in-depth knowledge of the various technical elements of a DP workflow. In summary, we have established a low-cost, end-to-end WSI DP pipeline useful for spatiotemporally asynchronous translational pathology research, in an academic setting.

17

Recovery of genomic and transcriptomic profiles from decades-old FFPE brain tissues

Robinson Christiansen, C.; Hansen Firoozfard, E.; Oskolkov, N.; Gilbert, M. P. T.; Mak, S. S. T.; Wirendfeldt, M.; Kjaer, C.; Marmol-Sanchez, E.

2026-04-22 molecular biology 10.64898/2026.04.20.719637 medRxiv

Top 1%

0.5%

Show abstract

Neurological, neurodegenerative, and psychiatric disorders impose substantial morbidity and disability worldwide, yet their molecular basis remains incompletely understood, in part due to limited access to human brain tissue. The Danish Brain Collection, comprising brains from individuals who lived in Danish psychiatric institutions from the 1940s to the 1980s, represents a unique but largely untapped resource for retrospective molecular investigation. Here, we assess the feasibility of extracting and sequencing DNA and RNA from decades-old FFPE brain tissue. We systematically evaluate how extraction and library preparation strategies influence nucleic acid yield and quality, and show that RNA end-repair prior to library preparation substantially enhances transcript diversity, improving data quality from highly degraded samples. Despite extensive fragmentation, we recover biologically informative transcriptomic profiles, including protein-coding and microRNA expression profiles that retain clear tissue specificity. These results establish the Danish Brain Collection as a viable resource for genomic and transcriptomic analyses and demonstrate the broader potential of archival FFPE tissues for large-scale molecular studies.

18

Practical quantification of immunohistochemistry antigen concentrations and reaction-diffusion parameters

Peale, F. V.; Perng, W.; Mbiribindi, B.; Andrews, B. T.; Wang, X.; Dunlap, D.; Eastham, J.; Ngu, H.; Chernyshev, A.; Orlova, D.

2026-04-21 pathology 10.64898/2026.04.16.719078 medRxiv

Top 1%

0.5%

Show abstract

The immunohistochemistry (IHC) methods widely used in diagnostic medicine and biomedical research are kinetically complex reaction-diffusion processes that, ideally, produce stain intensities correlated with the local antigen concentration. Yet after 75 years of use, practical theoretical tools to rigorously plan and interpret IHC experiments are still lacking. Because modeling the reactions requires time-consuming computer simulation, impractical for regular use, most protocols are optimized empirically, without detailed knowledge of the reaction rates and antigen-antibody equilibria. The resulting stain intensities can be calibrated against standards with known antigen abundance, but they are typically not interpretable in terms of chemical antigen concentrations. To address these limitations, we developed a fast interpolation method to model reaction-diffusion behavior, and experimental methods to characterize IHC kinetic parameters in formalin-fixed paraffin-embedded (FFPE) samples. Used together, these allow experimental measurement of both the chemical concentration of antigen in the sample and the reaction-diffusion parameters consistent with the assay results. Results show 1) direct immunofluorescent detection has low nanomolar sensitivity with >1000-fold dynamic range, and 2) antibody diffusion rates in FFPE samples can be >1000-fold slower than in aqueous solutions, producing diffusion-limited conditions in which the IHC reaction time course may depend on the sample antigen concentration. Awareness of these details is necessary to avoid potential underestimation of both the absolute and relative antigen concentrations in different samples that may occur if staining is stopped before reaching equilibrium. Software tools are provided to allow users to rapidly model IHC reaction time courses and to fit experimental time course data with candidate reaction parameters. The principles described here apply equally to other tissue-based "spatial omics" analyses and should be considered when designing and interpreting experiments requiring any macromolecule to diffuse into and react in a tissue section. SIGNIFICANCEThe theoretical and experimental framework described here advances IHC staining from a qualitative or semi-quantitative method towards a more rigorously quantitative assay. The practical ability to predict IHC reaction kinetics and fit reaction parameters to experimental data has the potential to advance IHC applications in diagnostic medicine and biomedical research in three ways: 1) interpretation of experimental and diagnostic samples stained under different conditions can be more objective, facilitating comparison of results from different protocols and different laboratories; 2) IHC staining can be interpreted as molar chemical antigen-antibody concentrations calculated from the reaction parameters measured in the studied sample; 3) the correlation between antigen concentration and biological behavior can be examined more reliably. Practical software tools are provided.

19

Outcome Prediction Models for Critically Ill Patients Using Small Routine Laboratory Datasets

Cao, X.; Hou, J.; Wei, X.; Wang, Q.

2026-04-27 emergency medicine 10.64898/2026.04.26.26351758 medRxiv

Top 1%

0.4%

Show abstract

We present a suite of foundational, outcome prediction models for critically ill patients, developed using readily available, routine blood tests and advanced machine learning techniques. The input data of the models includes complete blood counts (CBCs), metabolic panels, and additional biomarkers that assess liver and kidney function, coagulation status, and cardiac injury. The output yields the predicted outcome at a given future horizon. For diagnoses, the length of the future horizon is set to zero, while it is set to a fixed time interval for prognoses. The training dataset in this study comprises clinical data from 332 ICU patients, augmented with 200 synthetic samples generated via a conditional diffusion model. Generative machine learning based data imputation and augmentation approaches yielded modest gains in predictive accuracy. However, substantial performance improvements were achieved through additional methods, including dimensionality and order reduction, SHAP based feature importance analysis, and a novel time series to image encoding strategy that enables the use of image based classifiers for temporal clinical data. Principal component analysis based order reduction produced measurable gains in outcome prediction, while the time series to image encoding proved particularly effective in mitigating small data limitations common in clinical research. Across all evaluation metrics, accuracy, precision, recall, F1 score, and AUROC, the prognostic models achieved performance exceeding 85\%, with some models attaining AUROC scores above 90%. We innovated a new model ensemble approach to optimize the predictive outcome. This ensemble modeling approach improves the overall prediction, pushing all assessment metrics over 90% . This work establishes a robust and interpretable AI enabled diagnostic and prognostic toolkit for outcome predictions in critically ill patients and demonstrates a scalable workflow for developing high performing models from sparse healthcare datasets. The proposed framework is readily deployable in ICU environments with routine blood testing capabilities and serves as a foundation for future integration into digital twin systems for critical care.

20

A Cross-Cohort Validated Plasma Lipid Biomarker Assay for Early Breast Cancer Detection Using Machine Learning

Huang, T.; Koch, F. C.; Peake, D. A.; Adam, K.-P.; David, M.; Li, D.; Heffernan, K.; Lim, A.; Hurrell, J. G.; Preston, S.; Baterseh, A.; Vafaee, F.

2026-04-23 oncology 10.64898/2026.04.23.26351564 medRxiv

Top 1%

0.4%

Show abstract

Early detection of breast cancer remains essential for improving clinical outcomes, and complementary non-invasive approaches are needed to support existing screening methods, particularly for women with dense breast tissue. We have previously reported plasma lipid biomarker discovery using untargeted high-resolution liquid chromatography tandem mass spectrometry (LC-MS/MS). In this study, we performed biomarker confirmation and developed machine-learning models applied to targeted plasma lipid measurements for the non-invasive detection of early-stage breast cancer across international cohorts with independent external validation. Targeted LC-MS/MS was used to quantify candidate lipid panels in plasma samples from European discovery cohorts (n = 554) and an independent Australian cohort (n = 266) used for external validation. Data-driven feature selection identified a 15-lipid panel with strong performance in European cohorts (AUC >= 0.94). External validation prior to confidence stratification yielded 76% sensitivity, 64% specificity, and an AUC of 0.81 in the Australian validation cohort. Clinical assay development requires iterative panel and model testing to support translational feasibility and performance in the intended-use population. An analytically viable panel, excluding lipids requiring complex and costly synthesis, achieved comparable accuracy with improved assay robustness. Confidence-based analysis showed enhanced performance for predictions made with moderate to high confidence, with sensitivity up to 89% and AUC up to 0.85, suggesting that ongoing research should focus on strategies to enhance diagnostic model confidence. Importantly, model predictions were independent of breast density, tumour size, grade, subtype, and morphology, indicating biological specificity of the lipid signature. These results demonstrate that calibrated machine-learning models applied to plasma lipid biomarkers can support non-invasive breast cancer detection. Expanding training datasets to include greater diversity will further improve performance in the ongoing development of this lipid-based detection approach.